Here we will make use of the data introduced in the XMM Image Intro notebook. Many of our early examples of generative models, as well as the Bayes on a Grid and Gibbs/Metropolis tutorials, were based around the task of performing photometry on image data, specifically with images sparse enough that the number of photon detections per pixel is typically small. This is exactly the regime we are in with the XMM data. Specifically, you will
It might sound like there's not much new here, and that's somewhat true. This tutorial is more about putting into practice what you've learned with less scaffolding than we normally provide. It's also the case that this data set is not entirely trivial to wrestle with (and that's with us already cutting some corners, as usual). Plus, we thought it was silly to have a course in statistical methods in astrophysics that didn't model and fit an image of the sky sooner or later.
TutorialName = 'xmm2'
exec(open('tbc.py').read()) # define TBC and TBC_above
import astropy.io.fits as pyfits
import numpy as np
import matplotlib.pyplot as plt
%matplotlib inline
from astropy.visualization import LogStretch
logstretch = LogStretch()
import scipy.stats as st
from scipy.optimize import minimize
The class below is from the XMM Image Intro notebook, with one addition: a mask array which holds a True or False for each pixel, telling us whether we should pay attention to that pixel in the likelihood. This allows us to, by default, ignore the pixels where the exposure map is zero. We'll also use it to ignore most point-like sources.
class Image:
def __init__(self, imdata, exdata, imx=None, imy=None):
self.im = imdata
self.ex = exdata
self.mask = (self.ex > 0.0)
if imx is None or imy is None:
# Note the swapped order of shape[1] and shape[0]. This is the only time we should need to remember this!
self.imx, self.imy = np.meshgrid(np.arange(imdata.shape[1]), np.arange(imdata.shape[0]))
else:
self.imx = imx
self.imy = imy
def cutout(self, x0, x1, y0, y1):
# return the subset of self bounded by pixel indices x0, x1, y0 and y1 (inclusive)
return Image(self.im[y0:y1+1,x0:x1+1], self.ex[y0:y1+1,x0:x1+1],
self.imx[y0:y1+1,x0:x1+1], self.imy[y0:y1+1,x0:x1+1])
def extent(self):
return [np.min(self.imx), np.max(self.imx), np.min(self.imy), np.max(self.imy)]
def display(self, log_image=True):
fig, ax = plt.subplots(1,2);
extent = self.extent()
if log_image:
ax[0].imshow(logstretch(self.im*self.mask), cmap='gray', origin='lower', extent=extent);
ax[0].set_title('image (log scale)');
else:
ax[0].imshow(self.im*self.mask, cmap='gray', origin='lower', extent=extent);
ax[0].set_title('image');
ax[1].imshow(self.ex*self.mask, cmap='gray', origin='lower', extent=extent);
ax[1].set_title('exposure map');
As before, we'll read in and display the data...
datadir = 'data/'
imagefile = datadir + 'P0098010101M2U009IMAGE_3000.FTZ'
expmapfile = datadir + 'P0098010101M2U009EXPMAP3000.FTZ'
imfits = pyfits.open(imagefile)
exfits = pyfits.open(expmapfile)
im = imfits[0].data
ex = exfits[0].data
orig = Image(im, ex)
plt.rcParams['figure.figsize'] = (10.0, 10.0)
orig.display()
... and crop the image slightly to better isolate the actual data.
data = orig.cutout(100, 580, 100, 580)
data.display()
As discussed in the Data Intro, the image contains a cluster of galaxies and a number of point-like Active Galactic Nuclei (AGN), most of which are relatively faint and may not be obvious in the images above. In some previous year, we made a by-eye list of pixel $x,y$ positions and radii (also in pixels) for the AGN, where the radii are supposed to encompass the regions where the AGN brightness clearly exceeds the uniform background. As you might guess, there are more sophisticated ways of doing this, but this works pretty well as an approach to removing bright point-like sources (provided you're willing to spend the human time making such a list).
pts = np.loadtxt(datadir+'P0098010101M2U009_ptsrc.txt')
pts
array([[231. , 398. , 9.6640149],
[187. , 417. , 6.6794089],
[361. , 473. , 7.8936824],
[335. , 416. , 5.8364482],
[380. , 358. , 3.8626665],
[390. , 417. , 5.5885783],
[397. , 293. , 3.538914 ],
[416. , 208. , 5.2474726],
[270. , 215. , 5.3269609],
[299. , 211. , 6.0974003],
[285. , 161. , 3.7078355],
[344. , 152. , 4.8141418],
[167. , 360. , 5.6344116],
[196. , 247. , 4.6760734],
[276. , 233. , 5.0308471],
[240. , 211. , 4.1267598],
[250. , 378. , 4.4363759],
[309. , 412. , 2.5081459],
[459. , 286. , 5.9048854],
[441. , 352. , 4.6259039],
[287. , 267. , 4.4204645],
[147. , 316. , 4.7704631],
[150. , 285. , 7.9281045],
[222. , 238. , 5.561259 ],
[489. , 405. , 4.0450217],
[480. , 317. , 4.7402437]])
As we'll discuss more in the Model section below, we'll want to mask (set data.mask[i,j]=False) all of these AGN except for the brightest one, which not coincidentally has the largest "radius" in the table above. Do so.
TBC() # set data.mask=False within all of the AGN masks but the one with the largest radius
Let's check that this did what we want by re-displaying the data; pixels where data.mask==False are set to zero in the displayed images.
data.display()
You might be wondering why we're choosing to leave one AGN unmasked here. The idea is that, since we know the AGN is tony compared with the telescope's PSF, it could provide us with a way to measure the size of the PSF from the image itself. In real life we would never do this - XMM is a space telescope that doesn't look through the atmosphere, so its PSF is plenty stable and can be well determined from observations of even brighter AGN. On the other hand, there are times when an AGN is projected onto an extended source of interest, and we want to model them both rather than throwing away a chunk of the target. So the problem we'll be solving is, at least, illustrative.
To recap, our model needs to explain
We'll address each of these in turn.
We'll use a common parametric model for the surface brightness (counts per unit time per pixel) of galaxy clusters: the azimuthally symmetric beta model,
$S_\mathrm{CL}(r) = S_0 \left[1.0 + \left(\frac{r}{r_c}\right)^2\right]^{-3\beta + 1/2}$,
where $r$ is projected distance from the cluster center. Note that this model describes a 2D surface brightness distribution, since $r^2 = (\Delta x)^2 + (\Delta y)^2$. Also note that the pixel spacingis a perfectly respectable units of projected distance for our purposes.
To reduce parameter degeneracies to a manageable level (remember how we like to do that?), we generally make the log of $S_0$, rather than $S_0$ itself, a parameter. In that case, to be explicit, the parameters of this part of the model are:
To get a little intuition for what this function looks like, write a function that evaluates $S(r; S_0, r_c, \beta)$ and make a (log-log) plot of it for a few choices of parameter values, showing the limiting behavior for $r\ll r_c$ and $r\gg r_c$.
TBC() # function definition and plot
The unmasked AGN in our image is point-like (really, it is), so we can describe it using just
Of course, the AGN doesn't appear point like because it's been smeared by the PSF, which we'll get to below. If we wanted to define a surface brightness for the AGN, it would involve a Dirac delta function, as in
$S_\mathrm{AGN}(x,y) = F_a \delta(x-x_a) \delta(y-y_a)$.
As described in the Data Intro, there are several contributors to the background, some of which are vignetted by the telescope optics and some of which are not. Breaking this down would be too much for us to deal with from just this data one data set. Our expectation (not actually verified here) is that the particle-induced background, which is roughly uniform and unvignetted, is generally dominant by far, so we will model only that component. This gives us a single parameter,
Note that there is no "per second" in this case because (see below) an unvignetted model will not be multiplied by the exposure map. (This is purely a conventional thing. We could certainly parametrize the background as the count rate per pixel, using the actual exposure time of the observation. But, given that the data products mix the exposure time and vignetting correction into a single map, the approach above is simpler.)
The models for the cluster and AGN above describe the X-rays that arrive at the telescope aperture. Two subsequent effects need to be accounted for before we can make a prediction for the data, which is the number of counts in each image pixel. First, the PSF smears out the image, and second the exposure map simultaneously accounts for the vignetting (reducing the apparent brightness of everything far away from the center of the field of view) and the exposure time (converting count rates to counts). As defined above, the expectation for the background is uniform, and needs neither of these corrections (it's already a prediction for the counts in each pixel).
Putting this all together, the mean counts/pixel predicted by the model can be written
$\mu(x,y) = b + T(x,y) \cdot \mathrm{PSF} \otimes \left[ S_\mathrm{AGN} + S_\mathrm{CL} \right]$,
where $T(x,y)$ is the exposure map and $\otimes$ indicates a convolution.
The exposure map is given to us, and we don't have a basis for assigning any uncertainty, so we'll take it to be fixed. The PSF we'll model as a symmetric 2D Gaussian; this is not entirely accurate, but will do for our purposes. This gives us yet another parameter,
Over to you: draw the PGM and enumerate the corresponding probabilistic relationships for the model.
TBC() # answer in Markdown
Solution not given - we will work together on this in class if needed.
For simplicity and reproducability, we'll use wide, uniform priors on each of the parameters defined above, with the stipulation that the cluster center and AGN must be located within the image held by data. In addition there are the usual mathematical/physical bounds to do with parameters encoding distances being non-negative, and our understandable desire to not divide by zero. Note that there is also a physical bound on $\beta$ from the requirement that the total flux of the cluster be finite. For completeness, list the prior you will use for each parameter here.
TBC() # answer in Markdown
Solution not given - we will work together on this in class if needed.
We're not going to dictate what algorithm you use to find the posterior distribution. Before doing so, however, we'd like you to find an approximate posterior mode using scipy.optimize.minimize. Experience has shown that this is an exceptionally good idea for this problem, rather than trying to start a standard MCMC chain from a guessed initial position.
For concreteness, write a function called cash that returns $-2\ln(\mathrm{posterior})$ which you can minimize to find the posterior mode. (For the sampling distribution relevant to this problem, an author named Cash was responsible for pointing out the appropriate function form, and the fact that it isn't $\chi^2$.) cash should take a parameter array as it's argument, since that's what minimize expects to work with.
For most MCMC samplers you might use, writing a log-posterior function that cash simply calls would be most convenient. But that's up to you.
Your implementation should also include a separate function, modelImage, that evaluates $\mu(x,y)$ as given above. Being able to visualize the model prediction will make debugging much easier, trust us. To enable code in the notebook below to work, the argument list for this function should be the list of parameters in the order we define below (the actual argument names can vary):
param_names = ['b', 'x0', 'y0', 'lnS0', 'rc', 'beta', 'rpsf', 'xa', 'ya', 'lnFa']
param_labels = [r'$b$', r'$x_0$', r'$y_0$', r'$\ln S_0$', r'$r_c$', r'$\beta$', r'$R_\mathrm{psf}$',
r'$x_a$', r'$y_a$', r'$\ln F_a$']
One last, very important thing. This is going to be (for this class) a relatively expensive posterior to evaluate, at least if coded naively. Therefore, please do not actually perform the PSF convolution of the cluster model (and recall that the convolution is analytic for a point-like source). This is purely to save us time. However, in this particular case the cluster happens to be much more extended than the PSF, to it won't have a huge impact on the model we fit. For a much less massive or more distant cluster, this would be a truly terrible corner to cut.
Off you go then.
def modelImage(b, x0, y0, lnS0, rc, beta, rpsf, xa, ya, lnFa):
# return an image containing the predicted expected counts in each pixel, mu(x,y)
TBC()
def cash(params):
# return -2*log(posterior)
# `params' is an array of parameters in canonical order (given by `param_names', above)
# accessing `data' from global scope rather than through an argument is highly recommended here - it's bad
# coding style but better coding efficiency
TBC()
TBC_above()
Next, we'll want initial values for each parameter. This problem turns out to be pretty sensitive to the initial guess - not only will Markov chains converge very slowly if we start them in a silly place (which is one reason we're finding the posterior mode first), but even the optimization will struggle if the initial guess is too unreasonable. So it's worth taking some time to do this right. Here are my suggestions:
If you have a tool like SAOImage DS9, some of those suggestions will be much easier to check out, but they can also all be done here in the notebook. Either way, fill in the guess dictionary below, including your reasoning in comments if it differed from above:
TBC()
# guess = {'b':...,
# 'x0':...,
# 'y0':...,
# 'lnS0':...,
# 'rc':...,
# 'beta':...,
# 'rpsf':...,
# 'xa':...,
# 'ya':...,
# 'lnFa':...,
# }
guessvec = [guess[k] for k in param_names]
/Users/amantz/miniconda3/lib/python3.7/site-packages/ipykernel_launcher.py:16: RuntimeWarning: invalid value encountered in true_divide app.launch_new_instance()
Now we can have a look at the model image produced by this guess (and revise the guess if need be):
plt.rcParams['figure.figsize'] = (5.0, 5.0)
plt.imshow(logstretch(modelImage(*guessvec)), origin='lower', extent=data.extent());
<matplotlib.image.AxesImage at 0x7f7be7540990>
And we can make sure that the cash function returns something:
cash(guessvec)
-9773.836709335745
Before turning things over to scipy.optimize.minimize, we need to be able to tell it the allowed range for each parameter. This is done through a list (in parameter order) of lower and upper bounds, with None standing for $\pm\infty$. Define this, consistent with your priors above, here.
TBC()
# bounds = [(...,...), # b
# (...,...), # x0
# (...,...), # y0
# (...,...), # lnS0
# (...,...), # rc
# (...,...), # beta
# (...,...), # rpsf
# (...,...), # xa
# (...,...), # ya
# (...,...), # lnFa
# ]
Finally, let's run the minimizer (expect this to take 30-60 seconds, depending on your guess) ...
%%time
bestfit = minimize(cash, guessvec, bounds=bounds)
bestfit
CPU times: user 35.4 s, sys: 2.91 s, total: 38.3 s Wall time: 39 s
fun: -12487.970144214709
hess_inv: <10x10 LbfgsInvHessProduct with dtype=float64>
jac: array([ 1.69166015, -0.27757756, -0.11150396, 0.48385118, 0.05820766,
-1.33622961, 0.2240995 , 0.0152795 , 0.03856254, -0.04147296])
message: 'CONVERGENCE: REL_REDUCTION_OF_F_<=_FACTR*EPSMCH'
nfev: 1210
nit: 97
njev: 110
status: 0
success: True
x: array([ 5.23732392e-02, 3.27798645e+02, 3.46903542e+02, -5.01894262e+00,
3.81792566e+00, 5.79246076e-01, 1.76044145e+00, 2.31638105e+02,
3.98260414e+02, -3.42714183e+00])
... and look at the result a little more naturally:
bestvec = bestfit.x
best = {k:v for k,v in zip(param_names,bestvec)}
best
{'b': 0.05237323923992111,
'x0': 327.79864465482206,
'y0': 346.9035415919116,
'lnS0': -5.018942620384719,
'rc': 3.8179256552032106,
'beta': 0.579246076027404,
'rpsf': 1.7604414474356018,
'xa': 231.63810504366427,
'ya': 398.26041431031706,
'lnFa': -3.4271418335360297}
Checkpoint: it should be possible for the minimizer to run successfully, i.e. without giving up and/or throwing an error.
Let's see if this set of parameters looks any different by eye:
plt.rcParams['figure.figsize'] = (5.0, 5.0)
plt.imshow(logstretch(modelImage(*bestvec)), origin='lower', extent=data.extent());
<matplotlib.image.AxesImage at 0x7f7be75f7b90>
Next, you'll produce samples from the posterior, using your choice of method. Be warned that this posterior function (at least as I implemented it) is relatively slow. If you are using an MCMC method, do make sure you've converged, but after that it isn't necessary to run super-long to get lots of minimally correlated samples. (For a real research project we would either bite the bullet and run for a long time, or, more likely, do a better job coding.)
In your notebook, we expect to see:
If you use nested sampling or some other non-MCMC method, substitute the equivalent diagnostics and plots.
TBC()
NB: The chains shown here are much longer than you need to run R = [1.01798858 1.01558145 1.01160828 1.01367267 1.01910798 1.02041055 1.02139222 1.02283644 1.01871441 1.01075227] neff = [602.63674249 557.63489128 672.07873237 572.98434479 457.47209123 447.43826293 492.26637429 627.23467224 533.25044708 651.84148079] NB: Since walkers are not independent, these will be optimistic!
Checkpoint: As we've done before, we can quickly compare your posterior samples to our own solutions. If your samples are not equally weighted, you will need to add the weights argument to plotGTC below.
solution = np.loadtxt('solutions/xray.dat.gz')
plotGTC([samples, solution], paramNames=param_labels, chainLabels=['yours', 'ours'],
figureSize=16, customLabelFont={'size':12}, customTickFont={'size':12}, customLegendFont={'size':16});
As ever, we're not done until we've done at least a simple check that the fitted model is consistent with generating the data. As in our toy photometry problem from earlier, we can do a quick frequentist goodness-of-fit test of the best fit (whose parameters we saved in best and bestvec) by comparing the Cash statistic of the best fit with its theoretical expectation if the model is correct. You can refer to that notebook for the basic outline. Here, go a little further by computing the p-value associated with $C$, i.e. the probability of getting a $C$ for the best fit that is at least as large as what we did get.
Note that the theoretical expectations computed by the cashstatistic package are for a particular form of the Cash statistic which may or may not differ from your cash function by a constant, so you'll want to also use the package's function to compute $C$ from the best fit for the comparison (or at least check that your function is consistent).
Below, compute and print
TBC()
Best-fit C: 60362.28369598087 Theoretical prediction: 59706.646304825816 +/- 349.66052660256963 Pee value: 0.030391659643704765
/Users/amantz/miniconda3/lib/python3.7/site-packages/cashstatistic/cashstatistic.py:21: RuntimeWarning: divide by zero encountered in log return np.where(x==0, 2.0*mu, 2.0*(mu - x + x*np.log(x/mu))) /Users/amantz/miniconda3/lib/python3.7/site-packages/cashstatistic/cashstatistic.py:21: RuntimeWarning: invalid value encountered in multiply return np.where(x==0, 2.0*mu, 2.0*(mu - x + x*np.log(x/mu)))
Checkpoint: I get a pee value of ~0.03. So-so.
To get a feel for specifically which part of the model might be less than ideal, make a visualization that compares $S(r)$, azimuthally averaged brightness as a function of radius about the best-fit cluster center, from the data and a number of posterior samples. Looking at the image, we would expect to see a steady drop from the cluster center outwards, with a small bump due to the AGN, eventually approaching a constant (if the background is indeed uniform). The AGN itself will be washed out by averaging it with all the other azimuths at the same radius, so a second plot centered on the AGN position, out to smaller radial distance, might be a good idea.
TBC()
Solution not given - we will work together on this in class if needed.
Comment on what you see in light of the Cash statistic test above.
TBC() # answer in Markdown
Solution not given - we will work together on this in class if needed.
Here we are, finally having done some analysis of an astronomical image. Specifically, you've now performed X-ray photometry on a point-like source and an extended source. Admittedly, we cut a few corners, but this is still a more complete job than a surprising amount of the literature does. Hopefully you also learned a few things about juggling and fitting models to real data.